ggmaplibrary(tidyverse)
library(ggmap)
library(RColorBrewer)
library(gridExtra)
options(digits = 3)
set.seed(1234)
theme_set(theme_minimal())ggmapggmap is a package for R that retrieves raster map tiles from online mapping services like Google Maps and plots them using the ggplot2 framework. The map tiles are raster because they are static image files generated previously by the mapping service. You do not need any data files containing information on things like scale, projection, boundaries, etc. because that information is already created by the map tile. This severely limits your ability to redraw or change the appearance of the geographic map, however the tradeoff means you can immediately focus on incorporating additional data into the map.
Google has recently changed its API requirements, and ggmap users are now required to provide an API key and enable billing. We will learn far more about APIs next week. In the meantime, I would not recommend trying to use Google Maps to obtain map images. The code below would work for you, but Google now charges you each time you obtain a map image. Stick to the other providers such as Stamen Maps.
ggmap supports open-source map providers such as OpenStreetMap and Stamen Maps, as well as the proprietary Google Maps. Obtaining map tiles requires use of the get_map() function. There are two formats for specifying the mapping region you wish to obtain:
Bounding box requires the user to specify the four corners of the box defining the map region. For instance, to obtain a map of Chicago using Stamen Maps:
# store bounding box coordinates
chi_bb <- c(left = -87.936287,
bottom = 41.679835,
right = -87.447052,
top = 42.000835)
chicago_stamen <- get_stamenmap(bbox = chi_bb,
zoom = 11)
chicago_stamen## 627x712 terrain map image from Stamen Maps. see ?ggmap to plot it.
To view the map, use ggmap():
ggmap(chicago_stamen)The zoom argument in get_stamenmap() controls the level of detail in the map. The larger the number, the greater the detail.
get_stamenmap(bbox = chi_bb,
zoom = 12) %>%
ggmap()The smaller the number, the lesser the detail.
get_stamenmap(bbox = chi_bb,
zoom = 10) %>%
ggmap()Trial and error will help you decide on the appropriate level of detail depending on what data you need to visualize on the map.
Use bboxfinder.com to determine the exact longitude/latitude coordinates for the bounding box you wish to obtain.
While Stamen Maps and OpenStreetMap require the bounding box format for obtaining map tiles and allow you to increase or decrease the level of detail within a single bounding box, Google Maps requires specifying the center coordinate of the map (a single longitude/latitude location) and the level of zoom or detail. zoom is an integer value from 3 (continent) to 21 (building). This means the level of detail is hardcoded to the size of the mapping region. The default zoom level is 10.
# store center coordinate
chi_center <- c(lon = -87.65, lat = 41.855)
chicago_google <- get_googlemap(center = chi_center)
ggmap(chicago_google)get_googlemap(center = chi_center,
zoom = 12) %>%
ggmap()get_googlemap(center = chi_center,
zoom = 8) %>%
ggmap()Use Find Latitude and Longitude to get the exact GPS coordinates of the center location.
Each map tile provider offers a range of different types of maps depending on the background you want for the map. Stamen Maps offers several different types:
Google Maps is a bit more limited, but still offers a few major types:
See the documentation for the get_*map() function for the exact code necessary to get each type of map.
get_map()is a wrapper that automatically queries Google Maps, OpenStreetMap, or Stamen Maps depending on the function arguments and inputs. While useful, it also combines all the different arguments ofget_googlemap(),get_stamenmap(), andgetopenstreetmap()and can become a bit jumbled. Use at your own risk.
Now that we can obtain map tiles and draw them using ggmap(), let’s explore how to add data to the map. The city of Chicago has an excellent data portal publishing a large volume of public records. Here we’ll look at crime data from 2017.1 I previously downloaded a .csv file containing all the records, which I import using read_csv():
If you are copying-and-pasting code from this demonstration, change this line of code to
crimes <- read_csv("https://cfss.uchicago.edu/data/Crimes_-_2017.csv")to download the file from the course website.
crimes <- read_csv("data/Crimes_-_2017.csv")
glimpse(crimes)## Observations: 267,345
## Variables: 22
## $ ID <int> 11094370, 11118031, 11134189, 11156462,...
## $ `Case Number` <chr> "JA440032", "JA470589", "JA491697", "JA...
## $ Date <chr> "09/21/2017 12:15:00 AM", "10/12/2017 0...
## $ Block <chr> "072XX N CALIFORNIA AVE", "055XX W GRAN...
## $ IUCR <chr> "1122", "1345", "4651", "1110", "0265",...
## $ `Primary Type` <chr> "DECEPTIVE PRACTICE", "CRIMINAL DAMAGE"...
## $ Description <chr> "COUNTERFEIT CHECK", "TO CITY OF CHICAG...
## $ `Location Description` <chr> "CURRENCY EXCHANGE", "JAIL / LOCK-UP FA...
## $ Arrest <chr> "true", "true", "true", "true", "true",...
## $ Domestic <chr> "false", "false", "false", "false", "fa...
## $ Beat <chr> "2411", "2515", "0922", "2514", "1221",...
## $ District <chr> "024", "025", "009", "025", "012", "002...
## $ Ward <int> 50, 29, 12, 30, 32, 20, 9, 12, 12, 27, ...
## $ `Community Area` <int> 2, 19, 58, 19, 24, 40, 49, 30, 30, 23, ...
## $ `FBI Code` <chr> "10", "14", "26", "11", "02", "15", "03...
## $ `X Coordinate` <int> 1156443, 1138788, 1159425, 1138653, 116...
## $ `Y Coordinate` <int> 1947707, 1913480, 1875711, 1920720, 190...
## $ Year <int> 2017, 2017, 2017, 2017, 2017, 2017, 201...
## $ `Updated On` <chr> "03/01/2018 03:52:35 PM", "03/01/2018 0...
## $ Latitude <dbl> 42.0, 41.9, 41.8, 41.9, 41.9, 41.8, 41....
## $ Longitude <dbl> -87.7, -87.8, -87.7, -87.8, -87.7, -87....
## $ Location <chr> "(42.012293397, -87.699714109)", "(41.9...
Each row of the data frame is a single reported incident of crime. Geographic location is encoded in several ways, though most importantly for us the exact longitude and latitude of the incident is encoded in the Longitude and Latitude columns respectively.
Let’s start with a simple high-level overview of reported crime in Chicago. First we need a map for the entire city.
chicago <- get_googlemap(center = c(lon = -87.65, lat = 41.855),
zoom = 11)
ggmap(chicago)geom_point()Since each row is a single reported incident of crime, we could use geom_point() to map the location of every crime in the dataset. Because ggmap() uses the map tiles (here, defined by chicago) as the basic input, we specify data and mapping inside of geom_point(), rather than inside ggplot():
ggmap(chicago) +
geom_point(data = crimes,
mapping = aes(x = Longitude,
y = Latitude))What went wrong? All we get is a sea of black.
nrow(crimes)## [1] 267345
Oh yeah. There were 267345 reported incidents of crime in the city. Each incident is represented by a dot on the map. How can we make this map more usable? One option is to decrease the size and increase the transparancy of each data point so dense clusters of crime become apparent:
ggmap(chicago) +
geom_point(data = crimes,
aes(x = Longitude,
y = Latitude),
size = .25,
alpha = .01)Better, but still not quite as useful as it could be.
stat_density_2d()Instead of relying on geom_point() and plotting the raw data, a better approach is to create a heatmap. More precisely, this will be a two-dimensional kernel density estimation (KDE). In this context, KDE will take all the raw data (i.e. reported incidents of crime) and convert it into a smoothed plot showing geographic concentrations of crime. The core function in ggplot2 to generate this kind of plot is geom_density_2d():
ggmap(chicago) +
geom_density_2d(data = crimes,
aes(x = Longitude,
y = Latitude))By default, geom_density_2d() draws a contour plot with lines of constant value. That is, each line represents approximately the same frequency of crime all along that specific line. Contour plots are frequently used in maps (known as topographic maps) to denote elevation.
The Cadillac Mountains. Source: US Geological Survey
Rather than drawing lines, instead we can fill in the graph so that we use the fill aesthetic to draw bands of crime density. To do that, we use the related function stat_density_2d():
ggmap(chicago) +
stat_density_2d(data = crimes,
aes(x = Longitude,
y = Latitude,
fill = stat(level)),
geom = "polygon")Note the two new arguments:
geom = "polygon" - change the geometric object to be drawn from a density_2d geom to a polygon geomfill = stat(level) - the value for the fill aesthetic is the level calculated within stat_density_2d(), which we access using the stat() notation.This is an improvement, but we can adjust some additional settings to make the graph visually more useful. Specifically,
bins, or unique bands of color allowed on the graphalpha so we can still view the underlying mapbrewer.pal() from the RColorBrewer package to create a custom color palette using reds and yellows.ggmap(chicago) +
stat_density_2d(data = crimes,
aes(x = Longitude,
y = Latitude,
fill = stat(level)),
alpha = .2,
bins = 25,
geom = "polygon") +
scale_fill_gradientn(colors = brewer.pal(7, "YlOrRd"))From this map, a couple trends are noticeable:
Because ggmap is built on ggplot2, we can use the core features of ggplot2 to modify the graph. One major feature is faceting. Let’s focus our analysis on four types of crimes with similar frequency of reported incidents2 and facet by type of crime:
ggmap(chicago) +
stat_density_2d(data = crimes %>%
filter(`Primary Type` %in% c("BURGLARY", "MOTOR VEHICLE THEFT",
"NARCOTICS", "ROBBERY")),
aes(x = Longitude,
y = Latitude,
fill = stat(level)),
alpha = .4,
bins = 10,
geom = "polygon") +
scale_fill_gradientn(colors = brewer.pal(7, "YlOrRd")) +
facet_wrap(~ `Primary Type`)There is a large difference in the geographic density of narcotics crimes relative to the other catgories. While burglaries, motor vehicle thefts, and robberies are reasonably prevalent all across the city, the vast majority of narcotics crimes occur in the west and south sides of the city.
While geom_point() was not appropriate for graphing a large number of observations in a dense geographic location, it does work rather well for less dense areas. Now let’s limit our analysis strictly to reported incidents of homicide in 2017.
(homicides <- crimes %>%
filter(`Primary Type` == "HOMICIDE"))## # A tibble: 671 x 22
## ID `Case Number` Date Block IUCR `Primary Type` Description
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2.31e4 JA149608 02/11/… 001XX … 0110 HOMICIDE FIRST DEGRE…
## 2 2.39e4 JA530946 11/30/… 088XX … 0110 HOMICIDE FIRST DEGRE…
## 3 2.34e4 JA302423 06/11/… 047XX … 0110 HOMICIDE FIRST DEGRE…
## 4 2.34e4 JA312425 06/18/… 006XX … 0110 HOMICIDE FIRST DEGRE…
## 5 2.37e4 JA490016 10/28/… 048XX … 0110 HOMICIDE FIRST DEGRE…
## 6 2.32e4 JA210752 04/03/… 013XX … 0110 HOMICIDE FIRST DEGRE…
## 7 2.36e4 JA461918 10/07/… 018XX … 0110 HOMICIDE FIRST DEGRE…
## 8 2.36e4 JA461918 10/07/… 018XX … 0110 HOMICIDE FIRST DEGRE…
## 9 1.08e7 JA138326 02/01/… 013XX … 0142 HOMICIDE RECKLESS HO…
## 10 2.35e4 JA364517 07/26/… 047XX … 0110 HOMICIDE FIRST DEGRE…
## # ... with 661 more rows, and 15 more variables: `Location
## # Description` <chr>, Arrest <chr>, Domestic <chr>, Beat <chr>,
## # District <chr>, Ward <int>, `Community Area` <int>, `FBI Code` <chr>,
## # `X Coordinate` <int>, `Y Coordinate` <int>, Year <int>, `Updated
## # On` <chr>, Latitude <dbl>, Longitude <dbl>, Location <chr>
We can draw a map of the city with all homicides indicated on the map using geom_point():
ggmap(chicago) +
geom_point(data = homicides,
mapping = aes(x = Longitude,
y = Latitude),
size = 1)Compared to our previous overviews, few if any homicides are reported downtown. We can also narrow down the geographic location to map specific neighborhoods in Chicago. First we obtain map tiles for those specific regions. Here we’ll examine North Lawndale and Kenwood.
# North Lawndale is the highest homicides in 2017
# Compare to Kenwood
north_lawndale <- get_map(location = c(lon = -87.714401,
lat = 41.858768),
zoom = 14)
kenwood <- get_map(location = c(lon = -87.59320836086425,
lat = 41.80965352973664),
zoom = 15)
ggmap(north_lawndale)ggmap(kenwood)To plot homicides specifically in these neighborhoods, change ggmap(chicago) to the appropriate map tile:
ggmap(north_lawndale) +
geom_point(data = homicides,
aes(x = Longitude, y = Latitude))ggmap(kenwood) +
geom_point(data = homicides,
aes(x = Longitude, y = Latitude))North Lawndale had the most reported homicides in 2017, whereas Kenwood had only a handful. And even though homicides contained data for homicides across the entire city, ggmap() automatically cropped the graph to keep just the homicides that occurred within the bounding box.
All the other aesthetic customizations of geom_point() work with ggmap. So we could expand these neighborhood maps to include all violent crime categories3 and distinguish each type by color:
(violent <- crimes %>%
filter(`Primary Type` %in% c("HOMICIDE",
"CRIM SEXUAL ASSAULT",
"ROBBERY")))## # A tibble: 14,146 x 22
## ID `Case Number` Date Block IUCR `Primary Type` Description
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1.12e7 JA531910 12/01/… 022XX … 0265 CRIM SEXUAL A… AGGRAVATED:…
## 2 1.10e7 JA322389 06/25/… 003XX … 031A ROBBERY ARMED: HAND…
## 3 1.12e7 JA545986 12/11/… 007XX … 031A ROBBERY ARMED: HAND…
## 4 1.12e7 JA546734 12/12/… 007XX … 031A ROBBERY ARMED: HAND…
## 5 1.12e7 JB147188 10/08/… 092XX … 0281 CRIM SEXUAL A… NON-AGGRAVA…
## 6 1.12e7 JB147599 08/26/… 001XX … 0281 CRIM SEXUAL A… NON-AGGRAVA…
## 7 2.31e4 JA149608 02/11/… 001XX … 0110 HOMICIDE FIRST DEGRE…
## 8 1.10e7 JA378592 08/05/… 038XX … 0313 ROBBERY ARMED: OTHE…
## 9 1.12e7 JA538651 12/06/… 092XX … 031A ROBBERY ARMED: HAND…
## 10 1.12e7 JB149656 12/24/… 005XX … 0330 ROBBERY AGGRAVATED
## # ... with 14,136 more rows, and 15 more variables: `Location
## # Description` <chr>, Arrest <chr>, Domestic <chr>, Beat <chr>,
## # District <chr>, Ward <int>, `Community Area` <int>, `FBI Code` <chr>,
## # `X Coordinate` <int>, `Y Coordinate` <int>, Year <int>, `Updated
## # On` <chr>, Latitude <dbl>, Longitude <dbl>, Location <chr>
ggmap(north_lawndale) +
geom_point(data = violent,
aes(x = Longitude, y = Latitude,
color = `Primary Type`)) +
scale_color_brewer(type = "qual", palette = "Dark2")ggmap(kenwood) +
geom_point(data = violent,
aes(x = Longitude, y = Latitude,
color = `Primary Type`)) +
scale_color_brewer(type = "qual", palette = "Dark2")devtools::session_info()## setting value
## version R version 3.5.1 (2018-07-02)
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/Chicago
## date 2019-01-02
##
## package * version date source
## assertthat 0.2.0 2017-04-11 CRAN (R 3.5.0)
## backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
## base * 3.5.1 2018-07-05 local
## base64enc 0.1-3 2015-07-28 CRAN (R 3.5.0)
## bindr 0.1.1 2018-03-13 CRAN (R 3.5.0)
## bindrcpp 0.2.2 2018-03-29 CRAN (R 3.5.0)
## bitops 1.0-6 2013-08-17 CRAN (R 3.5.0)
## broom 0.5.0 2018-07-17 CRAN (R 3.5.0)
## cellranger 1.1.0 2016-07-27 CRAN (R 3.5.0)
## cli 1.0.0 2017-11-05 CRAN (R 3.5.0)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0)
## compiler 3.5.1 2018-07-05 local
## crayon 1.3.4 2017-09-16 CRAN (R 3.5.0)
## datasets * 3.5.1 2018-07-05 local
## devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
## digest 0.6.18 2018-10-10 cran (@0.6.18)
## dplyr * 0.7.8 2018-11-10 cran (@0.7.8)
## evaluate 0.11 2018-07-17 CRAN (R 3.5.0)
## forcats * 0.3.0 2018-02-19 CRAN (R 3.5.0)
## ggmap * 2.7.904 2018-11-14 Github (dkahle/ggmap@4dfe516)
## ggplot2 * 3.1.0 2018-10-25 cran (@3.1.0)
## glue 1.3.0 2018-07-17 CRAN (R 3.5.0)
## graphics * 3.5.1 2018-07-05 local
## grDevices * 3.5.1 2018-07-05 local
## grid 3.5.1 2018-07-05 local
## gridExtra * 2.3 2017-09-09 CRAN (R 3.5.0)
## gtable 0.2.0 2016-02-26 CRAN (R 3.5.0)
## haven 1.1.2 2018-06-27 CRAN (R 3.5.0)
## hms 0.4.2 2018-03-10 CRAN (R 3.5.0)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
## httr 1.3.1 2017-08-20 CRAN (R 3.5.0)
## jpeg 0.1-8 2014-01-23 CRAN (R 3.5.0)
## jsonlite 1.5 2017-06-01 CRAN (R 3.5.0)
## knitr 1.20 2018-02-20 CRAN (R 3.5.0)
## lattice 0.20-35 2017-03-25 CRAN (R 3.5.1)
## lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0)
## lubridate 1.7.4 2018-04-11 CRAN (R 3.5.0)
## magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
## methods * 3.5.1 2018-07-05 local
## modelr 0.1.2 2018-05-11 CRAN (R 3.5.0)
## munsell 0.5.0 2018-06-12 CRAN (R 3.5.0)
## nlme 3.1-137 2018-04-07 CRAN (R 3.5.1)
## pillar 1.3.0 2018-07-14 CRAN (R 3.5.0)
## pkgconfig 2.0.2 2018-08-16 CRAN (R 3.5.1)
## plyr 1.8.4 2016-06-08 CRAN (R 3.5.0)
## png 0.1-7 2013-12-03 CRAN (R 3.5.0)
## purrr * 0.2.5 2018-05-29 CRAN (R 3.5.0)
## R6 2.3.0 2018-10-04 cran (@2.3.0)
## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.5.0)
## Rcpp 1.0.0 2018-11-07 cran (@1.0.0)
## readr * 1.1.1 2017-05-16 CRAN (R 3.5.0)
## readxl 1.1.0 2018-04-20 CRAN (R 3.5.0)
## RgoogleMaps 1.4.3 2018-11-07 cran (@1.4.3)
## rjson 0.2.20 2018-06-08 cran (@0.2.20)
## rlang 0.3.0.1 2018-10-25 CRAN (R 3.5.0)
## rmarkdown 1.10 2018-06-11 CRAN (R 3.5.0)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.5.0)
## rvest 0.3.2 2016-06-17 CRAN (R 3.5.0)
## scales 1.0.0 2018-08-09 CRAN (R 3.5.0)
## stats * 3.5.1 2018-07-05 local
## stringi 1.2.4 2018-07-20 CRAN (R 3.5.0)
## stringr * 1.3.1 2018-05-10 CRAN (R 3.5.0)
## tibble * 1.4.2 2018-01-22 CRAN (R 3.5.0)
## tidyr * 0.8.1 2018-05-18 CRAN (R 3.5.0)
## tidyselect 0.2.5 2018-10-11 cran (@0.2.5)
## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.5.0)
## tools 3.5.1 2018-07-05 local
## utils * 3.5.1 2018-07-05 local
## withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
## xml2 1.2.0 2018-01-24 CRAN (R 3.5.0)
## yaml 2.2.0 2018-07-25 CRAN (R 3.5.0)
Full documentation of the data from the larger 2001-present crime dataset..↩
Specifically burglary, motor vehicle theft, narcotics, and robbery.↩
Specifcally homicides, criminal sexual assault, and robbery. Aggravated assault and aggravated robbery are also defined as violent crimes by the Chicago Police Departmant, but the coding system for this data set does not distinguish between ordinary and aggravated types of assault and robbery.↩
This work is licensed under the CC BY-NC 4.0 Creative Commons License.